Europaisches European ' Office europeen 

Patentamt Patent Office des brevets 



recd 2 4 NOV 2004 

WIPO pcT 

Bescheinigung Certificate Attestation 



Die angehefteten Unterla- 
gen stimmen mit der 
ursprQnglich eingereichten 
Fassung der auf dem nach- 
sten Blatt bezeichneten 
europaischen Patentanmel- 
dung Gberein. 



The attached documents 
are exact copies of the 
European patent application 
described on the following 
page, as originally filed. 



Les documents fixes a 
cette attestation sont 
conformes a la version 
initialement deposee de 
la demande de brevet 
europeen specifiee a la 
page suivante. 



Patentanmeldung Nr. Patent application No. Demande de brevet n° 

03104034.8 



| PRIORITY 
DOCUMENT 

SUBMITTED OR TRANSMITTED IN 
COMPLIANCE WITH RULE 17.1(a) OR (b) 



Der Prasldent des Europaischen Patentamts; 
Im Auftrag 

For the President of the European Patent Office 

Le President de I'Office europeen des brevets 
p.o. 




R C van Dijk 



I 




Europaisches 
Patentamt 



European 
Patent Office 



Office europeen 
des brevets 



Anmeldung Nr: 

Application no.: 03104034.8 
Demande no: 



Anmeldetag: 

Date of filing: 30.10.03 
Date de depot: 



Annie 1 der/Appl 1cant( s)/Demandeur( s) : 

Koninklijke Philips Electronics N.V. 
Groenewoudseweg 1 
5621 BA Eindhoven 
PAYS-BAS 



Bezelchnung der Erf 1ndung/Tl tie of the 1 nventl on/Tl tre de l 1 Invention: 
(Falls die Bezelchnung der Erffndung nlcht angegeben 1st, slehe Beschrelbung. 
If no title Is shown please refer to the description. 
S1 aucun tltre n'est 1nd1qu€ se referer a la description.) 

Audio signal encoding or decoding 



In Anspruch genommene Priori Ht( en) / Priori ty(1es) claimed /Priorities) 
revend1qu£e( s) 

Staat/Tag/Aktenze1chen/State/Date/Flle no./Pays/Date/Numero de depdt: 



Internationale Patentklasslf 1 katl on/International Patent Classification/ 
Classification Internationale des brevets: 

H03M13/00 

Am Anmeldetag benannte Vertragstaa ten/Contracting states designated at date of 
flHng/Etats contractants designees lors du depot: 

AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL 
PT R0 SE SI SK TR LI 



03104034.8 

EPA/EP0/0EB Form 1014.2 - 01.2000 7001014 



2 



PHNL03 131 7EPP 

1 30.10.2003 

Audio signal encoding or decoding 

The invention relates to encoding an audio signal or decoding an encoded 

audio signal. 

5 Erik Schuijers, Werner Oomen, Bert den Blinker and Jeroen Breebaart, 

"Advances in Parametric Coding for High-Quality Audio", Preprint 5852, 1 14th AES 
Convention, Amsterdam, The Netherlands, 22-25 March 2003 disclose a parametric coding 
scheme using an efficient parametric representation for the stereo image. Two input signals 
are merged into one mono audio signal. Perceptually relevant spatial cues are explicitly 
10 modeled as is shown in Fig. 1. The merged signal is encoded using a mono parametric 

« • 

encoder. The stereo parameters Interchannel Intensity Difference (HD) S the Interchannel 
Time Difference (TTD) and the Interchannel Cross-Correlation (ICC) are quantized, encoded 
and multiplexed into a bitstream together with the quantized and encoded mono audio signal. 
At the decoder side, the bitstream is de-multiplexed to an encoded mono signal and the stereo 
15 parameters. The encoded mono audio signal is decoded in order to obtain a decoded mono 
audio signal m 1 (see Fig. 2). From the mono time domain signal, a de-correlated signal is 
calculated using a filter D yielding perceptual de-correlation. Both the mono time domain 
signal m f and the de-correlated signal d are transformed to the frequency domain. Then the 

frequency domain stereo signal is processed with the IID, FTD and ICC parameters by 

- 

20 scaling, phase modifications and mixing, respectively, in a parameter processing unit in order 
to obtain the decoded stereo pair P and r\ The resulting frequency domain representations 
are transformed back into the time domain. 

25 An object of the invention is to provide advantageous audio encoding or 

decoding using spatial parameters. To this end, the invention provides an encoding method, 
an audio encoder, an apparatus for transmitting or storing, a decoding method, an audio 
decoder, a reproduction apparatus and a computer program product as defined in the 
independent claims. Advantageous embodiments are defined in the dependent claims. 
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According to a first aspect of the invention, an audio signal is encoded, the 
audio signal including a first audio channel and a second audio channel, the encoding 
comprising subband filtering each of the first audio channel and the second audio channel in 
a complex modulated filterbank to provide a first plurality of subband signals for the first 
5 audio channel and a second plurality of subband signals for the second audio channel, 
downsampling each of the subband signals to provide a first plurality of downsampled 
subband signals and a second plurality of downsampled subband signals, further subband 
filtering at least one of the downsampled subband signals in a further filterbank in order to 
provide a plurality of sub-subband signals, deriving spatial parameters from the sub-subband 

10 signals and from those downsampled subband signals that are not further subband filtered, 
and deriving a single channel audio signal comprising derived subband signals derived from 
the first plurality of downsampled subband signals and the second plurality of downsampled 
subband signals. By providing a further subband filtering in a subband, the frequency 
resolution of said subband is increased. Such an increased frequency resolution has the 

1 5 advantage that it becomes possible to achieve higher audio quality (the bandwidth of a single 
sub-band signal is typically much higher than that of critical bands in the human auditory 
system) in an efficient implementation (because only a few bands have to be transformed). 
The parametric spatial coder tries to model the binaural cues, which are perceived on a non- 
uniform frequency scale, resembling the Equivalent Rectangular Bands (ERB) scale. The 

20 single channel audio signal can be derived directly from the first plurality of downsampled 
subband signals and the second plurality of downsampled subband signals. However, the 
single channel audio signal is advantageously derived from sub-subband signals for those 
downsampled subbands that are further subband filtered, in which case the sub-subband 
signals of each subband are added together to form new subband signals and wherein the 

25 single channel audio signal is derived from these new subband signals and the subbands from 
the first and second plurality of subbands that are not further filtered. 

According to another main aspect of the invention, audio decoding of an 
encoded audio signal is provided, the encoded audio signal comprising an encoded single 
channel audio signal and a set of spatial parameters, the audio decoding comprising decoding 

30 the encoded single channel audio channel to obtain a plurality of downsampled subband 

signals, further subband filtering at least one of the downsampled subband signals in a further 
filterbank in order to provide a plurality of sub-subband signals, and deriving two audio 
channels from the spatial parameters, the sub-subband signals and the downsampled subband 
signals for those subbands that are not further subband filtered. By providing a further 
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subband filtering in a subband, the frequency resolution of said subband is increased and 
consequently higher quality audio decoding can be reached. 

One of the main advantages of these aspects of the invention is that parametric 
spatial coding can be easily combined with Spectral Band Replication ("SBR") techniques. 
5 SBR is known per se from Martin Dietz, Lars Liljeryd, Kristofer Kjorling and Oliver Kunz, 
"Spectral Band Replication, a novel approach in audio coding^Preprint 5553, 112 th AES 
Convention, Munich, Germany, 10-13 May 2002, and from Per Ekstrand, "Bandwidth 
extension of audio signals by spectral band replication", Proc. 1st IEEE Benelux Workshop 
on Model based Processing and Coding of Audio (MPCA-2002), pp. 53-58, Leuven, 
10 Belgium, November 15, 2002. Further reference is made to the MPEG-4 standard ISO/IEC 
14496-3:2001/FDAM1, JTC1/SC29/WG1 1, Coding of Moving Pictures and Audio, 
Bandwidth Extension which describes an audio codec using SBR. 

SBR is based on the notion that there is typically a large correlation between 
the low and the high frequencies in an audio signal. As such, the SBR process consists of 
1 5 copying the lower partes) of the spectrum to the higher part(s) after which the spectral 

■ * 

envelope is adjusted for the higher part(s) of the spectrum using little information encoded in 
the bit stream. A simplified block diagram of such an SBR enhanced decoder is shown in Fig. 
3. The bit-stream is de-multiplexed and decoded into core data (e.g. MPEG-2/4 Advanced 
Audio Coding (AAC)) and SBR data. Using the core data the signal is decoded at half the 

20 sampling frequency of the full bandwidth signal The output of the core decoder is analyzed 
by means of a 32 bands complex (Pseudo) Quadrature Mirror Filter (QMF) bank. These 32 
bands are then extended to full bandwidth, i.e., 64 bands, in which the High Frequency (HF) 
content is generated by means of copying part(s) of the lower bands. The envelope of the 
bands for which the HF content is generated is adjusted according to the SBR data. Finally by 

25 means of a 64 bands complex QMF synthesis bank the PCM output signal is reconstructed. 

The SBR decoder as shown in Fig. 3 is a so-called dual rate decoder. This 
means that the core decoder runs at half the sampling frequency and therefore only a 32 
bands analysis QMF bank is used. Single rate decoders, where the core decoder runs at the 
full sampling frequency and the analysis QMF bank consists of 64 bands are also possible. In 

30 practice, the reconstruction is done by means of a (pseudo) complex QMF bank. Because the 
complex QMF filter bank is not critically sampled no extra provisions need to be taken in 
order to account for aliasing. Note that in the SBR decoder as disclosed by Ekstrand, the 
analysis QMF bank consists of only 32 bands, while the synthesis QMF bank consists of 64 
bands, as the core decoder runs at half the sampling frequency compared to the entire audio 
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decoder. In the corresponding encoder however, a 64 bands analysis QMF bank is used to 
cover the whole frequency range. 

Although the invention is especially advantageous for stereo audio coding, the 
invention is also of advantage to coding signals with more than two audio channels. 
5 These and other aspects of the invention axe apparent from and will be 

elucidated with reference to the embodiments described hereinafter. 



In the drawings: 

10 Fig. 1 shows a block diagram of a unit for stereo parameter extraction as used 

in a Parametric Stereo ("PS") encoder; 

Fig. 2 shows a block diagram of a unit for the reconstruction of a stereo signal 
as used in a PS decoder; 

Fig. 3 shows a block diagram of a Spectral Band Replication ("SBR") decoder; 
1 5 Fig. 4 shows a block diagram of combined PS and SBR enhanced encoder 

according to an embodiment of the invention; 

Fig. 5 shows a block diagram of combined PS and SBR enhanced decoder 
according to an embodiment of the invention; 

Fig. 6 shows an M bands downsampled complex QMF analysis (left) and 
20 synthesis bank (right); 

Fig. 7 shows a magnitude response in dB of a prototype filter; 
Fig. 8 shows a magnitude responses in dB of the first four out of 64 non- 
downsampled complex modulated analysis filters; 

Fig. 9 shows a block diagram of a Q bands filter bank with trivial synthesis; 
25 Fig. 10 shows a combined magnitude response in dB of a first non- 

downsampled modulated QMF filter and 8 bands complex modulated filter bank; 

Fig. 1 1 shows a stylized magnitude response of 4 bands evenly stacked, filter 

batik (top) and oddly stacked filter bank (bottom) according to an embodiment of the 
invention; 

30 Fig. 12 shows a 77 bands non-uniform hybrid analysis filter bank based on 64 

bands complex analysis QMF according to an embodiment of the invention; 

Fig. 13 shows a 71 bands non-uniform hybrid analysis filter bank based on 64 
bands complex analysis QMF for use in an audio decoder; and 
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Fig. 14 shows a block diagram of efficient implementation of the complex 



modulated analysis filter bank 

The drawings only show those elements that are necessary to understand the 

invention. 



Combining SBR with PS potentially yields an extremely powerful codec. Both 
SBR and PS are post-processing algorithms in a decoder consisting of a feirly similar 
structure, i.e., some form of time to frequency conversion, processing and finally frequency 
to time conversion. When combining both algorithms, it is required that both algorithms can 
run concurrently on e.g. a DSP application. Hence, it is advantageous to reuse as much as 
possible of the calculated intermediate results of one codec for the other. In the case of 
combining PS with SBR this leads to reusing the complex (Pseudo) QMF sub-band signals 
for PS processing. In a combined encoder (see Fig. 4) the stereo input signal is analyzed by 
means of two 64 bands analysis filter banks. Using the complex sub-band domain 
representation, a PS calculation unit estimates the stereo parameters and creates a mono (sub- 
band) down-mix is created. This mono down-mix is then fed to an SBR parameter estimation 
unit Finally the mono down-mix is converted back to the time domain by means of a 32 
bands synthesis filter bank such that it can be coded by the core decoder (core decoder needs 
only half the bandwidth). 

In the combined decoder as shown in Fig. 5, regardless whether or not a dual 
rate or a single rate system is being used, the full bandwidth (64 bands) subband domain 
signals after envelope adjustment are converted to a stereo set of subband domain signals 
according to the stereo parameters. These two sets of sub-band signals are finally converted 

• * 

to the time domain by means of the 64 bands synthesis QMF bank. If one would just combine 
PS with SBR, the bandwidth of the lower frequency bands of the QMF filter is larger than 
what is required for a high quality stereo representation. So, in order to be able to give a high 
quality representation of the stereo image, a further sub-division of the lower sub-band 
signals is performed according to advantageous embodiments of the invention. 

For a better understanding of aspects of the invention, the theory behind 
complex QMF sub-band filters is first explained 
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OMF sub-band filters 

The QMF analysis sub-band filter can be described as following. Given a real 
valued linear phase prototype filter p(y ) , an M -band complex modulated analysis filter 

bank can be defined by the analysis filters 



for k = 0 , . 1 fyf- 1 . The phase parameter 9 is not important for the analysis that follows, 
but a typical choice is (N y M 2 , where N is the prototype filter order. Given a real 

10 valued discrete time signal jc(v ) , the sub-band signals (n) are obtained by filtering 

(convolution) x(y ) with h k (y), and then downsampling the result by a factor M (see left 

hand side of Fig. 6). 



with a factor M , followed by filtering with complex modulated filters of the type (1), adding 
IS up the results and finally taking twice the real part (see right hand side of Fig. 6). Then near- 
perfect reconstruction of real valued signals can be obtained by suitable design of a real 
valued linear phase prototype filter p(y ) . The magnitude response of the prototype filter as 

used in the SBR system of the MPEG-4 standard (refered to above) in case of 64 bands is 
shown in Fig. 7. The magnitude responses of the 64 complex modulated analysis filters are 

20 obtained by shifting the magnitude response of the prototype filter p(y ) by — (&+1/2). 



Part of these responses is shown in Fig. 8. Note that only the positive frequencies are filtered, 
except for k = 0 and k = M — 1 . As a result the sub-band signals prior to downsampling are 
close to being analytic, facilitating easy amplitude and phase modifications of real-valued 
sinusoids. Phase modifications are also possible for the first and last band as long as the 
25 sinusoids residing in these bands have a frequency that is above n I2M or below % -n I2M 
respectively. For frequencies outside this region the performance of phase modification 
deteriorates rapidly because of interference of the negative frequencies. 



5 




(1) 




M 



30 



Starting from the QMF analysis filters as described above, in embodiments of 
the invention, a finer frequency resolution is obtained by further filtering each downsampled 



■ Mild 
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subband signal v k (ri) into Q k sub-subbands. In the following the properties of the further 

subband filtering will be derived 



20 



Signal modification in the complex OMF sub-band domain 
5 In the following, let Z(co )^ J^_2 e xi P ~ ^ Q) ) ^ * e discrete time Fourier 

transform of a discrete time signal z(n) . Assuming the near-perfect reconstruction property 
as mentioned above and also a design where P(co) , the Fourier transform of p(y ) , 
essentially vanishes outside the frequency interval [-% IM 9 n I M] , which is the case for the 
prototype filter pfy) as illustrated above, the next step here is to consider a system where the 
10 sub-band signals (n) are modified prior to synthesis. Now, let each sub band k be 
modified by filtering with a filter B k (<&) . With the extending definition 

B k (co)=(_ l \&(D& r k<0 9 (2) 

where the star denotes complex conjugation, it can then be shown (neglecting overall delay, 
assuming a real valued input and a single rate system) that the resulting system including 
1 5 filter bank synthesis corresponds to a filtering with the filter 

B(P >£<*>)* & to 2k ) /M)| 2 . (3) 



25 sub-1 



According to the hypotheses regarding the properties of P(Q)) , inserting 
B k (<n)=l for all k in (3) leads to B(co ) = 1 , and a squared sum identity follows for the 
shifted prototype filter responses. By choosing real-valued constants B k (CD =)b k > 0 the 
system acts as an equalizer, which interpolates the gain values b k at frequencies 
%{KM 2 ) IM . The attractive feature is that the overall system is time-invariant, that is, free 

of aliasing, in spite of the use of down- and upsampling. This will of course only be true up 
to the amount of deviation to the stated prototype filter hypotheses. 

In order to derive a mono audio signal, additional sub-filtering of the complex 
J signals should not only preserve these properties, but also extend these properties to 
ipulation of the filtered sub-band signals. Sub-filtering preserving these properties can be 
performed using a modification of so-called Afth band filters as known per se ftom PJP. 
Vaidyanathan, tfi Multirate systems and filter banks", Prentice Hall Signal Processing Series, 
1993, sections 4.6.1-4.6.2). 



30 
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Modulated filter banks with trivial synthesis 

A discrete time signal v(n) can be split into Q different signals by a bank of 

filter with impulse responses g q (ri), q~0 ,.L Q r l . This is illustrated in Fig. 9. 
Let the corresponding analysis outputs be y q (ri) , and consider the trivial synthesis operation 

y( ^%( q n). (4) 

Perfect reconstruction, y( ri)=v( ri) , is then obtained by choosing the filters such that 

~ l g q (ri)=8(n), (5) 

where 8 (ri) = 1 if n = 0, and 8 (ri) = 0 if n * 0 . For causal filters, the right hand side of (5) 
would have to be replaced with 8 (n - d) where d is a positive delay, but this straightforward 
10 modification is omitted for clarity of exposition. 




The filters g (ri) can be chosen as complex modulations of a prototype filter g(ri) through 




2% 

g q (n) = g(ri) ex P|^ + 1 1 2 > \- C 6 ) 



In this preferred embodiment of the invention, the filters are oddly stacked (the 
1 5 factor q + 1 12 ). An advantage of this preferred embodiment will be explained later. Perfect 
reconstruction (5) is obtained if and only if 

g(£n) = 8(n)/Q. (7) 
A variation of this is the real-valued cosine modulation as 



So (*) = g(n) cos-l — ■ (q + 1 / 2)n 



(8) 



IQ 

20 with a real-valued prototype filter g(m) satisfying 

g(2Qn) = 8(n)/Q. (9) 
(This is easily obtained by consideration of g q (ri) + g Q - X _ q (ri) in (6).) 



Sub-filtering the complex-exponential modulated filter bank 
25 Starting from the QMF analysis filters as described above, a finer frequency 

resolution is obtained by farther filtering each downsampled subband signal v k (ri) into Q k 

sub-subbands by using one of the modulated structures (6) or (8) above. Denote die resulting 
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output signals y k (n) , and let g k (n) describe the filter bank applied within sub band k . If 

Q k = 1 , there is no filtering and g£( j)= 2( n).A typical application example is the case 
where M = 64, Q 0 = 8, Q k =4 for A; = 1,2, and Q k =lfor £>2. 

The combined effect of the two filter banks from x(y ) to y * (n) can be 
5 described as filtering with filters F k (co) followed by downsampling by a factor M , where 



F q k (a># k (&) k &<o). 



(10) 



If the prototype filter response P(G>) is essentially zero outside the interval [-% /M 9 % I M] , 
which is the case for the SBR analysis filters (see Fig. 7), then the filter F k (co) has a single 



nominal center frequency defined in the complex modulated case by 
10 <a k =2rc 0 4fcflf(&), 



(11) 



where sis a integer chosen such that Q k { k\ <gl Q£ sy Q£ k £+f) . For example, as 
illustrated in Fig. 10, if & = 0and Q 0 =8, the values of co 0 ^o^.-i-rf 0 . 7 ^ 



7C 



8M 



(1,3,5,7,9,1 h3H). 



15 Signal modification with non-uniform frequency resolution 

The insertion of sub-subband filter banks as described above does not 
introduce further downsampling, so the alias-free performance of signal modification as 
shown above in the case of complex QMF only, is preserved. Consider the general combined 
operation of M -subband analysis, further subband filtering by using Q k sub-subbands 

20 within subband k , filtering of each sub-subband signal y k (n) by a filter A k (co) , synthesis 



25 



within each subband k by summation, and finally synthesis through the M -band synthesis 
bank. The overall transfer function of such a system is given by (3) with, for k > 0 , 



q=0 



(12) 



For co >7C / (M), this gives 



M-l 



*=0 tf=0 



(13) 



so the throughput response of the sub-subband (k 9 q) is G*( )qS^ g$ / -& ) M)f 



For |co| <n I ( JBf) , some care has to be taken due to (2). In this frequency range it holds that 
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10 



15 



20 



B(G> 0$ - to ^ 2j^T| 2 ) d| > $&\/ Qf-Td Jd)f (14) 



and assuming a real sub-subband prototype filter coefficients, it holds that 

C3j(-»>^»). 
so if the modifying filters are chosen such that 



(15) 



(16) 



then 2f 0 (- Mo) * 5 £ Moo) and the squared sum identity mentioned in connection with (3) 



leads to 



-i 




B(0D 2) 0 M(0^ itfG) )G 9 U (MG>) 

g=0 



(17) 



for|a)| <n I (M) , corresponding to a throughput response (7° (Moo) for sub-subband ( Og) . 

Equations (IS) until (17) indicate the desire to discriminate between positive 
and negative frequencies. This is the reason why oddly stacked (complex) filters are being 
used for sub-filtering the QMF sub-band signals instead of evenly stacked (complex) filters 
(see Fig. 11). For evenly stacked filters it is not possible to apply phase modifications of 
sinusoids residing in the centre filter, i.e., the filter with a centre frequency of zero, as there is 
no discrimination between positive and negative frequencies possible. Assuming a prototype 
filter with a response G(co ) band limited to [-2k /Q,2rc /Q] , with Q the number of bands, 

for the evenly stacked case the lower limit to which phase modifications can approximately 
be applied is 2%/Q 9 whereas for the oddly stacked case the lower limit to which phase 

modifications approximately can be applied is % IQ . 

As mentioned in the introduction, for PS synthesis important special cases of 
the above are equalization and phase modification. For equalization, 
4t, (<p A ^ 0 and the condition (16) reduces to 



(18) 



The phase modification case corresponds to A k (o)) ff =e x^poc^) in which case the condition 



25 (16) is satisfied if 



a 



0 6b-- l~q A 



(19) 



30 



Stereo parameter e stimation 

The non-uniform complex filter bank, i.e. the QMF bank followed by the 
further subband filtering, as described above, can be applied to estimate the stereo parameters 
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Inter-channel Intensity Differences (ED), Inter-channel Phase Differences (EPD) and Inter- 
channel Cross Correlation (ICC) as shown below. Note that in this practical embodiment, 
1PD is used as a practically equivalent substitute for the ITD as used in the paper of Schuijers 
et al. In the combined PS encoder (see Fig. 4) the first three complex QMF channels are sub- 
filtered so that in total 77 complex-valued signals are obtained (see Fig. 12). 

From this point on the 77 complex-valued time-aligned left and right sub- 
subband signals are denoted as /*(») and r*(«) respectively, accordingly the indexing of 

y k M- 

To estimate the stereo parameters at a certain sub-band sample position ri the 
left, right and non-normalized cross-channel excitation are calculated as: 



e r [p)= 



( 
( 



I 



ri — +l+n 
2 



M 2 



+£ 



2 

+ £ 
J 



(20) 



f l 

ri +1+7! 

2 




for every stereo bin b , h(n) is the sub-band domain window with length L , £ a very small 
value preventing division by zero (e.g. £ = \e - 10 ) and /* (n) and r/ (») the left and right 
sub-subband domain signals. In case of 20 stereo bins, the summation over k from k, up to 
and including k k and q from g, up to and including q h goes as shown in Table. Note that 
the 'negative' frequencies (e.g. k = 0 with q = 4... 7) are not included in the parameter 
estimation of (20). 
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Table 1: Start and stop indices of summation over k and q 



o 


Jc, 






h 


Pass-band freauencv region 


0 


0 


0 


0 


0 


0-7C/256 


1 


0 


0 


1 


1 


% /256-2jc /256 


2 


0 


0 


2 


2 


2rc /256-37T /256 


3 


0 


0 


3 


3 


37C /256-tc /64 


4 


1 


1 


2 


2 


7c/64-37i;/128 


5 


1 


1 


3 


3 


37C/128-27C/64 


6 


2 


2 


0 


0 


2rc / 64-5tc /1 28 


7 


2 


2 


1 


1 


57C/128-37C/64 


8 


3 


3 


0 


0 


3ji/ 64-47t/64 


9 


4 


4 


0 


0 


47C/64-57C/64 


10 


5 


5 


0 


0 


57C /64-67C /64 


11 


6 


6 


0 


0 


6k /64-77C /64 


12 


7 


7 


0 


0 


1 - - ■ m jib J ^ 

7ji/64-87c/64 


13 


8 


8 


0 


0 


8jc/64-97C /64 


14 


9 


10 


0 


0 


VK / 64 — 1 lTE / 64 


15 


11 


13 


0 


0 


117C / 64- 1471 / 64 


16 


14 


17 


0 


0 


147C/64-187C/64 


17 


18 


22 


0 


0 


1871 /64-237C /64 


18 


23 


34 


0 


0 


237C / 64 - 357C / 64 


19 


35 


63 


0 


0 


35JC/64-7E 



The sir 



Hill 



these signals in the summation coincides with the parameter position, hence the shift by 



-—+1. As is clear from Table 1, only sub-subband signals and subband signals with a 
2 

positive centre frequency are used for estimating stereo parameters. 



The ED, denoted as l{b), the ICC, denoted as C{b) and the IPD, denoted as P(b) for each 
stereo bin b are calculated as: 
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1(b) = lOlog 



fat) 
Je t (b)e r {b) 



J 



(21) 



C(b) = 

m= 

The angle in the equation P(b) = Ze R (b) is calculated using the four quadrant arctangent 
function giving values between -n and % . Depending on target bit rate and application, 
these parameters, or a subset of these parameters are quantized and coded into the PS part of 
the bit-stream. 



Stereo signal synthesis 

In order to keep the computational costs (in terms of RAM usage) in the 
decoder as low as possible a similar analysis structure is used However the first band is only 
partially complex (see Fig. 13). This is obtained by summation of the middle band pairs 
G 2 °(Q)) and G 5 °(co) and G 3 °(co) and G° (co). Furthermore, the second and the third band are 
two-band real-valued filter banks, which is obtained by summation of the output of G* (a>) 
and G 3 *(co), and summation of me output of Gf (co) and G*(co) (see also the discussion in 
the section about modulated filter banks). Using this simplification of the decoder filter-bank 
structure, still the discriminative feature between positive and negative frequencies is 
maint ai n ed by subdivision of the first sub-band filter. The decoder analysis filter bank is 
shown in Fig. 13. Notice that the indexing of the first QMF filtered (sub-)subband signals is 
sorted according to frequency. 

The stereo (sub-)subband signals of a single frame are constructed as: 
l k (n)= A n s k (n)+ A a d k (n) 

r k (n)= A n s k (n)+ A n d k (n) (22) 

/*(») = /, (rfc** 
r k (n) = r k (n)e- jr " 



(23) 



with s k (n) the mono (sub-)subband signals, and d k (n) the mono de-correlated (sub- 
)subband signals that are derived from the mono (sub-)subband signals s k (n) in order to 
account for synthesizing the ICC parameters, k = 0,...,K - 1 the sub-band index (AT is the 
total number of sub-bands, i.e., K = 71), QMF sub-band sample index n = 0,...,N-1 with N 
the number of sub-band samples of a frame, A n , A n , A 21 , A 2 the scale factor 
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manipulation matrices and P rt the phase rotation manipulation matrix. The manipulation 

matrices are defined as function of time and frequency and can be derived straightforwardly 
from the manipulation vectors as described in the MPEG-4 standard ISO/BBC 14496- 
3:2001/FPDAM2, JTC1/SC29/WG11, Coding of Moving Pictures and Audio, Extension 2. 



s k (n) is defined according to Fig. 12 as resulting in Fig. 13: 

*,(*) = y* (») 

s 3 (n) = y?(n) 

*«<») = J*(»)+yJ(») 

s 5 (n) = y° 3 (n)+yXn) (24) 

s 6 (n) = y 1 0 (n)+yl(n) 

s 7 (n) = yl(n)+yl(n) 

s s (n) = yl(n) + y%(n) 

s 9 {n) = yl(n)+yl(n) 

s t (n) = y£-\ri) * = 10...70 

Synthesis of the stereo parameters takes place accordingly the indexing of Table 1. 



Table 1: Parameter indexing table 



k 


m 


Pass-band frequency region 


0 


1* 


-271/256 — tc/256 


1 


0* 


-7C/256-0 


2 


0 


O-jc/256 


3 


1 


7C/256-27C/256 


4 


2 


2ji/256-3jc/256 


5 


3 


37I/256-7C/64 


6 


5 


37C/128 -2rc/64 


7 


4 


271/128-37C/128 


8 


6 


47C/128-57C/128 


9 


7 


57C/128-67C/128 


10 


8 


371/ 64 -47t/ 64 


11 


9 


47c/64-5rc/64 



* 
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12 


10 


57I/64-67C/64 


13 


11 


67C/64-77C/64 


14 


12 


77C/64-87C/64 


15 


13 


87c/64-9ji/64 


16-17 


14 


9jc/64 -IIji/64 


18-20 


15 


lire/ 64 -1471/64 


21-24 


16 


147C/64-187C/64 


25-29 


17 


187C/64-237C/64 


30-41 


18 


23tc/64 - 35tc/64 


42-70 

• 


19 


357C /64-7t 



The synthesis equations thus look like: 

l k (») = A u (i(k),n)s k {n)+ A 21 (i(*), n)d k (n) 
r k (n) = A u (i(i), n)s k (n)+ {i(k),n)d k (n) 

l k (n) = I k (n)e J *" m -» ) 
r k {n) = r k {n)e- Jt ' {mA 



(25) 



(26) 



Note that the sign of P rt changes in the equations above if a * is encountered in Ihe table. 

This is accordingly equation (19), i.e., the inverse phase rotation has to he applied for the 
negative frequencies. 



10 



Efficient i mplementation of modulated filte r banks w ith trivial synthesis 

Given a modulated filter bank with a prototype filter of length L , a direct 
form implementation would require QL operations per input sample, but the fact that the 
modulation in (6) is antiperiodic wiih period Q can be used to split Ihe filtering into a 
polyphase windowing of L operations followed by a transform of size Q for each input 
sample. Please note that a polyphase representation as such is known from PJ>. 



15 Vaidya 







it 







* "Multirate systems and filter banks", Prentice Hall Signal Processing Series, 
1993, section 4.3). The following provides an advantagous application of such a polyphase 
representation according to a preferred embodiment of the invention. 

The transform is a DFT followed by a phase twiddle, which is of the order of 
Qlog 2 Q 9 when Q is a power of two. So a large saving is obtained in typical cases where 
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L is much larger than log 2 Q . In the real modulated case (8), antiperiodicity of period 
2Q combined with even/odd symmetries around n = 0 and n=Q can again be used for 
polyphase windowing, and the transform kernel is a DCT of type IE. A detailed description 
for the case of complex modulation is given below. 
5 An effective implementation of the sub-subfiltering, using FFT core 

processing, may be realized using poly-phase decomposition of the prototype filter followed 
by modulation. Assume a prototype filter g(n) of order N , where N = mQ and m is a 
positive integer. This condition is not restrictive, since a prototype filter of arbitrary order can 
be zero padded to fulfill the constraint. The Z-transform of the prototype filter designed for 
10 use in a complex modulated system (6) is 

JVY2 

G{ )z~ n (27) 

n=-N/2 

This may be expressed in poly-phase notation as 

G( *=)B d zQ > (28) 
/=o 

where 

nam ^ v 

15 *,(*=£)Xj2 I)*'" ( 29 > 

All filters of the filterbank are frequency-modulated versions of the prototype filter. The Z- 
transfonn of the filter g q (n) is given by 

At 



G fe G)z W q 2 ) ( 30 > 



where 



20 W = e ~® ( 31 ) 



25 



The expression for the output from one filter is 



= J^E,(z fT jz~ W**W~ 1 = (32) 

M> 

2z! _ . Af J^rl 



.•jim 



^iV* J' \* 

1=0 

identifying the components of the last sum, it may be 

gnal, 
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by a complex exponential Finally, all the output signals Y g (z) 9 q = 0 .. <g-l, ate found by 
applying an inverse EFT (without scaling fector). Fig. 14 shows the layout for the analysis 
filter bank. Since the poly-phase filters in (29) are non-causal, a proper amount of delay has 
to be added to all the poly-phase components. 
5 It should be noted that the above-mentioned embodiments illustrate rather than 

limit the invention, and that those skilled in the art will be able to design many alternative 
embodiments without departing from the scope of the appended claims. In the claims, any 
reference signs placed between parentheses shall not be construed as limiting the claim. The 
word 'comprising* does not exclude the presence of other elements or steps than those listed 

10 in a claim. The invention can be implemented by means of hardware comprising several 
distinct elements, and by means of a suitably programmed computer. In a device claim 
enumerating several means, several of these means can be embodied by one and the same 
item of hardware. The mere fact that certain measures are recited in mutually different 
dependent claims does not indicate that a combination of these measures cannot be used to 

15 advantage. 
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CLAIMS: 

1 • A method of encoding an audio signal, the audio signal including a first audio 

channel and a second audio channel, the method comprising the steps of: 

subband filtering each of the first audio channel and the second audio channel 
in a complex modulated filterbank to provide a first plurality of subband signals for the first 
5 audio channel and a second plurality of subband signals for the second audio channel, 

downsampling each of the subband signals to provide a first plurality of 
downsampled subband signals and a second plurality of downsampled subband signals, 

fiirther subband filtering at least one of the downsampled subband signals in a 
further filterbank in order to provide a plurality of sub-subband signals, 
10 deriving spatial parameters from the sub-subband signals and from those 

downsampled subband signals that are not fiirther subband filtered, and 

deriving a single channel audio signal comprising derived subband signals 
derived from the first plurality of downsampled subband signals and the second plurality of 
downsampled subband signals. 

15 

2. A method as claimed in claim 1, wherein for each subband that is fiirther 
subband filtered, the sub-subband signals are added together after scaling and/or phase 
rotation to form a new subband signal, and wherein the single channel audio signal is derived 

* 

from these new subband signals and the downsampled subband signals that are not fiirther 
20 filtered. 

3. A method as claimed in claim 1, wherein the further subband filtering is 
performed on at least the lowest frequency subband signal of the first plurality of 
downsampled subband signals and on the lowest frequency subband signal of the second 

25 plurality of downsampled subband signals. 

* 

4. A method as claimed in claim 3, wherein the further subband filtering is 
further performed on at least the next lowest frequency subband signal of the first plurality of 
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downsampled subband signals and on the next lowest frequency subband signal of the second 
plurality of downsampled subband signals. 

5. A method as claimed in claim 4, wherein the number of sub-subbands in the 
5 lowest frequency subband signals is higher than the number of sub-subbands in the next 

lowest frequency subband signals. 

6. A method as claimed in claim 1, wherein the further subband interbank is at 
least partially a complex modulated filter bank. 

10 

7. A method as claimed in claim 1, wherein the further subband filterbank is at 
least partially a real valued cosine modulated filter bank. 

8. A method as claimed in claim 1, wherein the further subband filter bank is an 
1 5 oddly stacked filter bank. 

9. A method as claimed in claim 1, wherein the sub-subband signals are not 
further downsampled. 

20 10. A method as claimed in claim 1, wherein the single channel audio signal is 

bandwidth limited and further coded and wherein spectral band replication parameters are 
derived from the first plurality of downsampled subband signals and/or the second plurality 
of downsampled subband signals. 

2511. An audio encoder for encoding an audio signal, the audio signal including a 

first audio channel and a second audio channel, the encoder comprising: 

a first complex modulated filterbank for subband filtering the first audio 
channel to provide a first plurality of subband signals for the first audio channel, 

a second complex modulated filterbank for subband filtering the second audio 
30 channel to provide a second plurality of subband signals for the second audio channel, 

means for downsampling each of the subband signals to provide a first 
plurality of downsampled subband signals and a second plurality of downsampled subband 
signals, 
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a further interbank for further subband filtering at least one of the 
downsampled subband signals in order to provide a plurality of sub-subband signals, 

means for deriving spatial parameters from the sub-subband signals and from 
those downsampled subband signals that are not further subband filtered, and 
5 means for deriving a single channel audio signal comprising derived subband 

signals derived from the first plurality of downsampled subband signals and the second 
plurality of downsampled subband signals. 

12. An apparatus for transmitting or storing an encoded audio signal based on an 
input audio signal, the apparatus comprising: 

an input unit to receive an input audio signal, 

an audio encoder as claimed in claim 1 1 for encoding the input audio signal to 
obtain an encoded audio signal, 

a channel coder to further code the encoded audio signal into a format suitable 
for transmitting or storing. 

* 

13. A method of decoding an encoded audio signal, the encoded audio signal 
comprising an encoded single channel audio signal and a set of spatial parameters, the 
method of decoding comprising: 

decoding the encoded single channel audio channel to obtain a plurality of 
downsampled subband signals, 

further subband filtering at least one of the downsampled subband signals in a 
further filterbank in order to provide a plurality of sub-subband signals, and 

deriving two audio channels from the spatial parameters, the sub-subband 
signals and those downsampled subband signals that are not further subband filtered 

14. A method as claimed in claim 13, wherein the further subband filtering is 
performed on at least the lowest frequency subband signal of the plurality of downsampled 
subband signals. 

30 

15. A method as claimed in claim 14, wherein the further subband filtering is 
further performed on at least the next lowest frequency subband signal of the plurality of 
downsampled subband signals. 



10 



15 



20 



25 
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16. A method as claimed in claim IS, wherein the number of sub-subbands in the 

lowest frequency subband signals is higher than the number of sub-subbands in the next 
lowest frequency subband signals. 

5 17. A method as claimed in claim 13, wherein the further subband filter bank is at 

least partially a complex modulated filter bank. 

18. A method as claimed in claim 13, wherein the further subband filterbank is at 
least partially a real valued cosine modulated filter bank. 

10 

19. A method as claimed in claim 13, wherein the further subband filter bank is an 
oddly stacked filter bank. 

20. A method as claimed in claim 13, wherein, in the lowest frequency subband, 
1 5 phase modifications to the sub-subband signals having a negative center-frequency in time 

domain are determined by taking the negative of the phase modification applied on a sub- 
subband signal having a positive center-frequency which is in absolute value closest to said 
negative center-frequency. 

20 21. A method as claimed in claim 13, wherein the encoded audio signal comprises 

spectral band replication parameters and wherein a high frequency component is derived 
from the plurality of downsampled subband signals and the spectral band replication 
parameters and wherein the two audio channels are derived from the spatial parameters, the 
sub-subband signals, those downsampled subband signals that are not further subband filtered 

25 and the high frequency component 

22. An audio decoder for decoding an encoded audio signal, the encoded audio 

signal comprising an encoded single channel audio signal and a set of spatial parameters, the 
audio decoder comprising: 
30 a decoder for decoding the encoded single channel audio channel to obtain a 

plurality of downsampled subband signals, 

a furher filter bank for further subband filtering at least one of the 
downsampled subband signals in a further filterbank in order to provide a plurality of sub- 
subband signals, and 
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means for deriving two audio channels from the spatial parameters, the sub- 
subband signals and those downsampled subband signals that are not further subband filtered 



23. An apparatus for reproducing an output audio signal, the apparatus 

5 comprising: 

an input unit for obtaining an encoded audio signal, 

an audio decoder as claimed in claim 22 for decoding the encoded audio signal 
to obtain the output audio signal, and 

a reproduction unit, such as a speaker or headphone output, for reproducing 
10 the output audio signal. 



24. A computer program product including code for instructing a computer to 

perform the steps of the method as claimed in claim 1 or 13. 

■ 
« 

■ * 
% 



4 
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ABSTRACT: 



Encoding an audio signal is provided wherein the audio signal includes a first 
audio channel and a second audio channel, the encoding comprising subband filtering each of 
the first audio channel and the second audio channel in a complex modulated filterbank to 
provide a first plurality of subband signals for the first audio channel and a second plurality 
5 of subband signals for the second audio channel, downsampling each of the subband signals 
to provide a first plurality of downsampled subband signals and a second plurality of 
downsampled subband signals, further subband filtering at least one of the downsampled 
subband signals in a further filterbank in order to provide a plurality of sub-subband signals, 
deriving spatial parameters from the sub-subband signals and from those downsampled 

■ 

10 subband signals that are not further subband filtered, and deriving a single channel audio 
signal comprising derived subband signals derived from the first plurality of downsampled 
subband signals and the second plurality of downsampled subband signals. 

Further, decoding is provided wherein an encoded audio signal comprising an < 

* 

encoded single channel audio signal and a set of spatial parameters is decoded by decoding 
15 the encoded single channel audio channel to obtain a plurality of downsampled subband 

signals, further subband filtering at least one of the downsampled subband signals in a further 
filterbank in order to provide a plurality of sub-subband signals, and deriving two audio 
channels from the spatial parameters, the sub-subband signals and those downsampled 
subband signals that are not further subband filtered. 

20 
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